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ABSTRACT 


Tv«)  new  classes  of  tests  for  regression  models,  likelihood  ratio  type 
tests  and  tests  based  on  quadratic  forms  of  robust  estimators,  are  introduced. 
Both  can  be  viewed  as  generalizations  of  the  classical  F-test.  By  means  of 
the  influence  function  their  robustness  properties  are  investigated  and  opti¬ 
mally  robust  tests  that  maximize  the  asymptotic  power  within  each  class, 
under  the  side  condition  of  a  bounded  influence  function,  are  constructed. 
Finally,  an  example  based  on  real  data  shows  that  these  tests  are  valuable 
robust  alternatives  to  the  F-test. 
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1.  INTRODUCTION 


In  this  paper  we  consider  the  following  regression  model.  Let 

:  1=l»...>n}  be  a  sequence  of  Independent  Identical  distributed 
random  variables  such  that 

“  xTe  +  e^  1=1, ...,n  ,  (1.1) 

P 

where  y^  Is  the  1th  observation,  x^-eF  Is  the  1th  row  (written  as 
column  vector)  of  the  design  matrix,  6e  0  C  F*^  a  p-vector  of  unknown 
parameters  and  e^cF  the  1th  error.  Suppose  that  e^-  Is  Independent  of 
x^  and  Is  distributed  according  to  a  normal  N(0,a  )  .  Moreover,  denote 
by  K(x)  the  distribution  of  the  x's  and  by  Fg(x,y)  the  joint  distribu¬ 
tion  of  (x^,y^)  . 

Classical  estimation  and  test  procedures  In  regression  models  are  based 
on  the  well  known  method  of  least  squares  (LS).  This  Is  mostly  justified  by 
the  Gauss-Markov  theorem  that  states  the  optimality  property  of  the  LS  estima¬ 
tor  within  the  class  of  all  linear  unbiased  estimators.  Linearity  is  a 
drastic  restriction;  many  maximum  likelihood  estimators  (for  example  assuming 
a  Cauchy  distribution  for  the  errors)  are  not  linear.  On  the  other  hand. 

It  Is  known  that  the  LS  estimator  Is  optimal  In  the  class  of  all  unbiased 
estimators  If  we  assume  that  the  errors  are  normally  distributed.  Therefore 
the  restriction  to  linear  estimators  can  be  justified  only  by  normality 
(or  simplicity).  But  the  normal  model  Is  never  exactly  true  and  In  the 
presence  of  small  departures  from  the  normality  assumption  on  the  errors, 
the  LS  procedures  (estimators  and  tests)  lose  efficiency  drastically;  see 
Huber  (1973)t  Hampel  (1973a,  1978a),  Schrader  and  Hettmansperger  (1980), 


Ronchetti  (1982a, b).  Thus,  one  would  prefer  to  have  procedures  which  are  only 
nearly  optimal  at  the  normal  model  but  which  behave  well  in  a  certain  neighbor¬ 
hood  of  it. 

Many  robust  regression  estimators  have  been  proposed  in  the  last  years. 

In  section  2  we  shall  review  an  importnat  class  of  such  estimators,  namely 
the  class  of  M-estimators.  Whereas  robust  estimation  theory  in  regression 
models  has  recently  received  more  and  more  attention  (see  for  instance, 

Huber  (1973),  Bickel  (1975),  Holland  and  Welsch  (1977),  Hampel  (1978b), 

Ruppert  and  Carroll  (1980),  Krasker  and  Welsch  (1982),  Ronchetti  and  Rousseeuw 
(1983),  Samarov  (1983)),  the  test  problem  has  been  somewhat  neglected. 

From  a  robustness  point  of  view  the  classical  test  procedures  based 
on  the  LS  estimators  suffer  similar  problems  as  the  LS  estimators  themselves. 
Although  the  F-test  is  moderately  robust  with  respect  to  the  level,  it  does 
lose  power  rapidly  in  the  presence  of  small  departures  from  the  normality 
assumption  on  the  errors.  Recently  Schrader  and  McKean  (1977)  and  Schrader 
and  Hettmansperger  (1980)  proposed  a  new  class  of  tests  for  linear  models 
based  on  Huber  estimates  in  the  full  and  reduced  model,  and  Sen  (1982)  found 
an  asymptotic  equivalent  version  of  them.  Nevertheless,  this  is  only  the 
first  step  for  finding  a  robust  version  of  the  F-test.  Like  Huber  estimators, 
these  tests  do  not  overcome  problems  caused  by  situations  where  the  fit  is 
mostly  determined  by  outlying  points  in  the  factor  space. 

The  purpose  of  this  paper  is  twofold.  On  one  side,  we  introduce  new 
classes  of  tests  that  generalize  the  classical  asymptotically  equivalent 
tests,  likelihood  ratio  tests  and  Wald  tests.  (The  generalization  of  a  third 
class,  C(a)  tests,  is  the  subject  of  a  separate  paper.)  On  the  other  hand, 
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we  propose  a  solution  for  the  inference  problem  in  regression  presenting 
optimally  robust  tests  that  are  the  natural  counterpart  of  optimally  robust 
estimators  and  that  can  be  used  to  construct  robust  confidence  intervals 
for  the  parameters. 

The  paper  is  organized  as  follows.  Section  2  gives  a  short  overview 
on  bounded-influence  estimation  in  linear  models  and  section  3  presents 
the  approach  to  robust  testing  we  use  to  construct  a  robust  version  of  the 
F-test.  In  section  4  and  5  we  introduce  two  new  classes  of  tests,  likelihood 
ratio  type  tests  and  tests  based  on  quadratic  forms  of  robust  estimators,  and 
wediscuss  their  asymptotic  distribution.  Each  class  can  be  viewed  as  the 
natural  generalization  of  a  corresponding  classical  test.  In  each  case  the 
robustness  requirement  as  specified  in  section  3  leads  to  an  optimally 
robust  test  procedure  which  is  a  valuable  robust  alternative  to  the  classical 
one.  Finally,  in  section  6  we  illustrate  the  excellent  performance  of 
optimally  robust  tests  by  means  of  an  example  based  on  real  data. 
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2.  BOUNDED- INFLUENCE  ESTIMATORS 


In  this  section  we  suninarl2e  briefly  the  results  on  bounded- influence 
estimation  in  regression  models.  Consider  the  model  (1.1). 

One  way  to  cope  with  the  problem  of  nonrobustness  of  least  squares 
estimators  is  to  study  a  large  class  of  estimators  generalizing  LS,  and  to 
select  more  robust  procedures  in  that  class.  It  appears  that  M-estimators 
are  most  appropriate  for  this  purpose.  Suppose  for  simplicity  a=l  .  An 
M-estimator  for  the  parameter  6  is  defined  as  the  solution  of  the 
implicit  equation 

n 

•  0  ,  (2.1) 

for  a  suitable  class  of  vectorvalued  functions  T  :  f’’  x  IR  x  f’’  . 
Because  of  the  invariance  properties  of  the  regression  model,  an  important 
role  is  played  by  the  following  special  class 

f(x.y;e)  «  n(x.y-x'^0)x  ,  (2.2) 

where  n  :  F*®x  F  F  . 

There  have  been  several  proposals  for  choosing  n  •  For  a  stimulating 
discussion  we  refer  to  the  papers  by  Krasker  and  Welsch  (1982)  and  Huber  (1983). 
Some  choices  of  n  are  of  the  form 

n(x,r)  •  w(x)-^(r-v(x))  ,  (2.3) 


where  :  F  -►  R ,  and  w.  :  F^  -►  R^  ,  v  :  F*^  F^  (weight  functions). 

Huber  (1973)  uses  w(x)  =  l.v(x)rl  ,  and  Mallows'  and  Andrews'  proposals  set 
v(x)  =  1  and  w(x)5l,  respectively.  Hill  and  Ryan  proposed  v(x)=w(x)  and  finally, 
Schweppe  suggested  choosing  v(x)  -  l/w(x)  ;  see  Hill  (1977),  Krasker  and 
Welsch  (1982). 

Two  tools  have  been  used  successfully  to  study  the  robustness  properties 

of  estimators.  The  first  one,  the  influence  function,  was  introduced  by 

Hampel  (1974)  and  is  essentially  the  first  derivative  of  an  estimator  viewed 

as  functional.  It  describes  the  normalized  influence  of  an  infinitesimal 

observation  on  the  estimator.  The  formal  definition  is  the  following. 

Suppose  the  estimator  T^  can  be  expressed  as  functional  T  of  the  empirical 

distribution  function  ,  T  =  T(F^^b  .  Then  the  influence  function 

n 

of  T  at  F  is  given  by 

IF(x,y;T,F)  =  l^m^o*^T((l-e)F+e6(^^yP  -  T(F)Ve  ,  (2.4) 

where  is  the  distribution  that  puts  mass  1  at  the  point  (x,y)  . 

The  sencond  tool  is  the  change-of-variance  function;  see  Hampel, 

Rousseeuw,  Ronchetti  (1981),  Ronchetti  and  Rousseeuw  (1983).  It  can  be 
viewed  as  the  derivative  of  the  asymptotic  covariance  matrix  of  the 
estimator,  and  describes  its  infinitesimal  stability.  From  a  robustness 
point  of  view,  a  desirable  property  of  these  functions  is  boundedness 
(in  some  norm).  This  means  that  any  (infinitesimal)  observation  (and 
therefore  any  outlier)  has  a  bounded  influence  on  the  estimator  and  on 
its  asymptotic  covariance  matrix,  respectively.  Existence  conditions  and 
mathematical  properties  of  derivatives  of  functionals  including  the  influence 
function  are  discussed  extensively  in  Clarke  (1983)  and  Fernholz  (1983). 
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The  influence  function  of  an  estimator  defined  by  (2.2)  is  given  by 
(see  Theorem  4.1  below 

IF(x,y;n.FQ)  =  n(x.y-x^e)M‘^x  ,  (2.5) 

where  M  =  |n'(x,r)xx^d*(r)dK(x)  .  By  suitable  choice  of  n  we  can  force 
I  IF(x,y;Ti.FQ)|  <  ®  ,  for  all  x  and  y  .  Note  that  nLg(x,r)  =  r  defines 
the  LS  estimator  and 

IF(x,y;TiL5,Fg)  =  (y-x^e)  •  (|uu^dK(u))“^x 

is  unbounded  in  x  and  y  . 

Hampel's  optimality  criterion  is  to  put  a  bound  on  the  IF  (measured  in 
some  norm)  and,  under  this  condition,  to  minimize  the  trace  of  the  asymptotic 
covariance  matrix  of  the  estimator  at  the  model.  The  first  condition  ensures 
robustness  to  the  estimator,  while  the  second  one  is  an  efficiency  condition. 
If  the  IF  is  measured  by  the  Euclidean  norm,  it  turns  out  that  the  optimally 
robust  estimator  within  the  class  (2.2)  is  the  Hampel -Krasker  estimator  which 
is  defined  by  a  n-f unction  of  Schweppe's  form 

’l'(r-lAxlI)  ,  (2.6) 

where  *  min(c,max(t,-c))  rst  the  Huber  i()-function,  the  matrix  A  is 

w 

defined  implicitly  by 


A'^  «  EC2*(c/|Ax|)-l)xx^3 


(2.7) 


-7- 


and  c  1s  a  positive  constant  depending  on  the  bound  on  ||  IF|  ;  see  Hampel 
(1978b),  Krasker  (1980). 

For  an  approach  to  robust  statistics  using  Influence  functions,  we  refer 
to  Hampel,  Ronchetti,  Rousseeuw,  Stahel  (1984).  A  critical  discussion  of 
bounded-influence  estimation  In  regression  can  be  found  In  Huber  (1983). 
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3.  THE  INFINITESIMAL  APPROACH  TO  TESTING 


The  infinitesimal  approach  to  testing  is  based  on  the  central  notion 
of  influence  function.  The  extension  of  this  concept  to  tests  has  been 
studied  by  Rousseeuw  and  the  author;  see,  Ronchetti  (1979,  1982a, b), 

Rousseeuw  and  Ronchetti  (1979,  1981).  It  turns  out  that  the  influence  function 
defined  on  the  test  statistic  (that  is  using  (2.4)  with  T=test  statistic)  is 
proportional  to  the  influence  of  an  infinitesimal  observation  on  the  level 
and  on  the  power  of  the  test.  Therefore,  a  test  statistic  with  a  '  ,nded 
influence  function  ensures  stability  of  the  level  and  of  the  power  '  the 
test  and  guarantees  robustness  of  validity  and  robustness  of  effic  <»’' 

Independently,  Lambert  (1981)  introduced  in  1979  an  influence  function 
for  the  P-value  of  a  test.  For  an  unconditional  test  this  function  is 
proportional  to  the  influence  function  of  the  test  statistic.  Therefore 
both  functions  have  the  same  qualitative  behaviour,  as  far  as  boundedness 
and  continuity  properties  are  concerned;  cf.  Lambert  (1981). 

Hampel's  optimality  criterion  can  be  extended  to  tests  as  follows. 

Find  a  test  which  maximizes  the  as^^ptotic  power  within  a 
certain  class,  under  the  side  condition  of  a  bound  on  the 
influence  function  of  the  test  statistic. 

As  in  the  case  of  estimators,  the  first  condition  guarantees  robustness 
and  the  second  one  efficiency  of  the  test.  We  shall  use  this  criterion  to 
select  the  optimally  robust  test  procedure  within  the  classes  of  tests 
defined  in  the  next  sections. 

For  a  comparison  between  different  influence  functions  for  tests,  see 
Ronchetti  (1982),  Field  and  Ronchetti  (1983).  The  infinitesimal  approach 
is  also  discussed  extensively  in  Hampel,  Ronchetti,  Rousseeuw,  Stahel  (1984). 
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4.  LIKELIHOOD  RATIO-TYPE  TESTS 


Consider  the  linear  model  (1.1)  and  suppose  we  want  to  test  the  linear 
hypothesis 


£j(6)~0  *  C|'i'l>...(P  t 


(4.1) 


where  Vi  ’•••’  ^  are  independent  and  0  <  q  <  p  .  Through  a  transforma¬ 
tion  of  the  parameter  space  we  can  reduce  this  hypothesis  to 


H 


0 


(4.2) 


where  denotes  the  component  of  the  vector  e  .  Let  e  be  the 

cu 

subspace  of  e  obtained  imposing  the  condition  Hq  .  The  classical  test 
for  testing  Hq  is  the  F-test  which  is  equivalent  to  the  likelihood  ratio 
test  It  rejects  Hg  for  "large"  values  of  the  test  statistic 


where  (Tq)^^  and  (T^)p  are  the  LS  estimates  of  0  in  the  full  (©)  and 
reduced  model  (0^^)  ,  respectively,  and 

2 

is  the  LS  (unbiased)  estimate  of  a  . 
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The  aim  of  this  section  Is  to  define  a  class  of  tests  that  can  be 
viewed  as  an  extension  of  the  log-likelihood  ratio  test  and  therefore  of 
the  F-test  for  linear  models. 

Let  us  first  Introduce  the  class  of  functions 

T  :  X  P  -*•  F'*’  ,  (x.r)  T(x,r) 

with  the  following  properties: 

(4,  TAU)  T(x,r)  i  0  ,  T(x,r)  >  0  for  all  x  e  F*^  ,  r  e  F  and  t(x,0)  =  0 

for  all  X  e  F^  .  t(x,*)  Is  differentiable  for  all  x  e  F*’  . 

Let  n(x,r)  :*  (8/3r)  t (x.r)  . 

(4.  ETAl)  Assume: 

(I)  n(x,')  Is  continuous  and  odd  for  all  x  e  F^  , 

(II)  Ti(x,r)  >  0  for  all  x  e  F^  and  r  e  F^  . 

(4.  ETA2)  Ti(x,*)  Is  differentiable  on  F\P(x;n)  for  all  x  e  F^  where 

t?(x;n)  Is  a  finite  set. 

Let  n'(x,r)  :=  (3/3r)Ti(x,r)  If  x  e  F^  ,  r  e  F\P(x;ti) 

0  otherwise,  and  assume 
suPy.|n'(x.r)  1  <  «  for  all  x  e  F^  . 

We  shall  also  assume  the  following  regularity  conditions: 

(4.  ETA3)  (1)  M  :=  ETi'(x.r)xx^  exists  and  Is  nonsingular 


-11- 


and 

2  T 

(ii)  Q  En  (x,r)xx  erists  and  is  nonsingular. 

Definition  4.1  The  class  of  tests  {t}  Is  given  by  test  statistics  of 
the  form 


S^(xj  .....  ;  yj  .....  y^)  ;= 

(4.4) 

-1  1 

2-(p-q)  n  .J:^(T(x^,r^^)-T{x^,rj^^))  . 

where  t  satisfies  the  conditions  (4.TAU),  (4.ETA1),  (4.ETA2),  (4.ETA3), 

•  '■iJi  •*  • 

and  (Tj^)^  .  (Tjj)p  are  the  M-estimators  in  the  reduced  and  full  model, 
that  is 

r((Tjn)  =  min{r(e)|0e0^}  (4.5) 

r({Tjj)^)  *  min{r(0)l0e0}  (4.6) 

with 

r  T- 

r(0)  ;=  Z  T(x^.(y^-xj0)/a)  .  (4.7) 

i=l  111 

2 

"Large"  values  of  are  significant. 
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(In  order  to  give  a  critical  region  we  shall  give  the  asymptotic  distribution 


of  < 

under  Hq  ,  see  below.) 

and  (Tjj)j,  fulfil  the  equations 

n 

(4.8) 

n 

Z^n(x^,rjj^)x^  =  0  . 

(4.9) 

Note  that  x  :=  (x^^^  x^^^  ,0,...,0)^  and 

<Vn  ■ 

'  •••••  (under 

Hq  the  last  (p-q) 

components  of  6  equal  0  !). 


Examples.  Define  the  following  functions 
w  :  F*’ 

p  :  F-*-F^  ;  :  F-^F  ,  r -►  4i(r)  :=  (a/3r)p(r)  . 

Some  choices  of  t  are  of  the  form  T(x,r)  =  w(x)p(r'v(x))  for  certain 
functions  w(x)  and  v(x)  .  They  correspond  to  the  estimators  given  in 


section  2. 

T(x,r) 

ri(x,r) 

estimator  corres 

r^/2 

r 

least  squares 

P(r) 

t|»(r) 

Huber 

w(x)  *0  (r) 

w(x)  -^pCr) 

Mallows 

w(x)  •p(r/w(x)) 

V»(r/w(x) ) 

Andrews 

p(r«w(x)) 

w(x)  tKr*w(x) ) 

Rill  and  I^an 

w^(x)  'p  (r/w(x) ) 

w(x)i{>(r/w(x) ) 

Schweppe 
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Rewark  1  In  practice,  one  usually  has  to  estimate  the  scale  parameter  a  . 
A  suitable  way  is  to  estimate  o  in  the  full  model,  taking  the  median 
absolute  deviation  or  the  Proposal  2  estimate  of  Huber;  see  Hampel  (1974), 
Huber  (1981),  p.  137.  More  precisely,  for  a  given  real  function  x  » 
has  to  solve  (4.8),  (4.9)  and 

I  x(rnJ  =  0  (4.10) 

i=l 

with  respect  to  (Tjj)^  ,  (Tjjj)^  and  a  .  Since  we  are  interested  in  the 

robustness  properties  of  these  tests,  let  us  compute  the  influence  function 

of  the  test  statistic  . 

n 

From  now  on  we  put  for  simplicity  a  =  1  . 

Let  S  ,  and  be  the  functionals  corresponding  to  ,  (T^)^ 
and  (Tjj)^  (see  Definition  4.1),  that  is 

S(F)  =  {2(p-q)’VcT(u,v-u^T^(F))-t(u,vVyF))3dF(u,v)}’*  (4.11) 

where  F  is  an  arbitrary  distribution  function  on  x  F  and  T  . 

U  H 

fulfil  the  system  of  equations 

/n{u,v-u''’T^  (F))  adF(u.v)  =  0 
/n(u,v-u^Tjj(F))  udF(u,v)  =  0  . 

(Note  that  T^'^^(F)  *  0  ,  for  j  =  q+1  ,...,  p  and  for  all  F  ,  and 

^n  “  ^^^n^  *  ^Vn  '  V^^n^  *  ’'n  empirical 

distribution  function  of  (x^.y^)  i=l  .  n  .)  The  next  proposition 
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gives  the  Influence  functions  of  and  S  at  the  null  hypothesis. 

(Note  that,  under  the  null  hypothesis,  6  =  6  =  (6^^^  6^^\o  0)^  , 

so  F~  is  the  model  distribution  under  the  null  hypothesis.) 

Theorem  4.1  Assume  (4.TAU),  (4.ETA1),  (4.ETA2)  and  the  following  conditions 

(4.IF1)  h(a)  :=  /Ti(x,y-x^o)xdFg(x,y)  exists  for  all  a  e  0  c  , 
(9/3a)h(a)  exists  and  is  continuous, 

(4.IF2)  h(e)  =  0  . 

Then,  the  influence  functions  of  T^  ,  Tj^  and  S  exist  and  equal 

(i)  IF(x,y;T^,Fg)  =  n(x,y-x^8)-(M)'’‘x  , 

(ii)  IF(x,yjTjj,Fg)  =  Ti(x,y-x^e)  M‘^x  , 

(iii)  IF(X,y;S,Fg)  =  |ri(x,y-x'^e) | .{Cx'^(M'l-(M)‘^)x]/(p-q)}’*  , 
where 

O"!  -  + 

and  (M)  denotes  the  pseudoinverse  of  M 

oj 

n  ■ 

Proof.  Assertions  (i)  and  (ii)  follow  from  Theorem  Gll.l  of  Stahel  (1981, 
p.  116),  with  P  =  Fg  .  His  conditions  "a",  "b",  "c"  follow  from  (4.IF1), 
(4.IF2)  and  condition  "d"  from  (4.ETA3)  (i).  Finally,  condition  "e"  follows 
from  (4.ETA2),  since  {(x,y)  |  y-x^6  e  P(x;ti)}  is  a  regular  hyperplane  in 
his  sense  (Stahel,  1981,  p.  12).  To  show  (iii),  denote  by  the 

distribution  on  x  R  that  puts  mass  1  at  (x,y)  and  define  the  following 
e-contaminated  distribution 


^e;e,(x,y)  ^^(x.y)  * 

After  a  straightforward  computation  we  get 

S^(Fg)  =  0  (by  (4.IF2))  . 


2  (p-q)"^[-Ti(x,y-x^e)x^«IF(x,y;T^,Pg^) 

+  ri(x,y-x^e)x^»IF(x,y;Tj^,Pg^)  ] 
2{p-q)  '^•n  (x,y-x*e)*(x^M  VxMM)  x) 


Using  L'HopItaTs  rule  twice,  we  obtain 


lF(x,y;S,F|^) 


“"c*0  <x,y)>-S(F5))A 


"e-O-  '*f,c,(x,y)>'''^)^'^ 

(j-  [(8V>cF)S*(Fg  ,  ,)]  )l/2 


This  completes  the  proof. 
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As  we  can  see  from  Theorem  4.1,  the  influence  function  corresponding 

to  the  F-statistic  (n(x,r)=r)  is  unbounded  in  x  and  r  =  y-x^S  .  Our 

goal  is  to  select  in  the  class  {t}  a  test  with  a  bounded  influence  function 

and  as  efficient  as  possible;  cf.  section  3.  To  accomplish  this  we  need  the 

distribution  of  the  test  statistic  defining  a  x-test. 

Denote  by  ((Tj^)n)(2)  vector  of  the  last  (p-q)  components 

of  (Tj,)„  .nd  let  M22  J  :•  H(22,  -  .  where  M„j)  ore  the  ' 

submatrices  of  M  corresponding  to  a  q  x  (p-q)  partition  of  M.  Moreover, 
-k  T 

define  V„(e)  :=  n  ^  Z  Ti(x^,y.-x.  e)x.  .  Then  under  the  given  conditions 
it  is  possible  to  show  (see  Ronchetti  (1982a, b))  that  the  statistics 

nS^  ,  (4.12) 

(p-q)"^vJ(0)(M'^-(M)‘^)V^(e)  and  (4.13) 

:=  (P-<l)-‘"((Vi!)(2,H22.i((T„)„),2)  (4.14) 

have  the  same  asymptotic  distribution.  More  precisely,  under  the  sequence 
of  alternatives 

,  j  =  q+1  ,...,  p  ,  (4.15) 

these  statistics  are  asymptotically  distributed  as 

I  (X^d  +  (c\2l)  • 

j.q+1  J  J 


(4.16) 
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where  A  *  (A^^^  .....  Np  are  independent  univariate  * 

standard  normal  variables.  >  ...  ^  ^  0  (p-p)  positive 

eigenvalues  of  Q(M"^-(M)'*^)  and  C  is  the  Choleski  root  of  M22  j  defined  by 

cJ  =  M22  1  (4.17) 

c‘''(M’^QM'^)^22)^  *  H22)  '  . ^P^  '  * 

Remark  2.  A  related  result  on  the  distribution  of  the  likelihood  ratio  test 
statistic  when  the  data  do  not  come  frwi  the  parametric  model  under  considera-  • 

tion  was  obtained  by  Kent  (1982);  cf.  also  Foutz  and  Srivastava  (1975). 

P  2  i 

Remark  3.  Under  the  null  hypothesis.  (4.16)  becomes  1  X,.(x,),-  .  where 

2  j=q+l  ^  ^  ^ 

(xpj  are  independent  x  -  random  variables  with  1  degree  of  freedom. 

We  are  now  ready  to  solve  the  optimality  problem  for  infinitesimal  ■ 

robustness  of  tests  (see  section  3).  We  find  a  x-test  which  maximizes  the 
asymptotic  power,  subject  to  a  bound  on  the  influence  function  of  the  test 
statistic  at  the  null  hypothesis.  We  give  the  solution  to  this  problem 
for  Huber's  regression  (that  is  assuming  T(x.r)  =  p(r))  and  in  the  general  case. 

Theorem  4.2  Assume:  T(x.r)«p(r)  .  Then  the  test  that  maximizes  the  asymptotic 
power,  under  the  side  condition  of  a  bounded  influence  of  the  residual 

sup^IR(r;S.*)  <  b  . 

is  given  by  Huber's  p-function  pj.(r)  *  r^/2  if  lr|<0 

*  c|rl-  c  /2  otherwise  . 


2 

Proof.  Under  the  assumption,  nS^  is  asymptotically  distributed  as 
^•Xp.q(5^)  »  where  \  «  ,  ij)(r)  *  dp/dr  and 

-  t(E/)2/E/j  aJ2,(E«T),22  ,,4,2)  • 

o 

Therefore  the  asymptotic  power  Is  a  monotone  Increasing  function  of  6  . 
Moreover,  the  Influence  function  can  be  factorized  In  two  components,  the 
first  one  depending  only  on  r  *  y-x^e  (Influence  of  the  residual)  and 
the  second  one  depending  only  In  x  (Influence  of  position  in  factor 

space).  Since  the  first  factor  equals  |\^(r) |/E^i|)'  ,  the  problem  we  have 

2  2 

to  solve  Is  equivalent  to  minimize  Etj*  /(E^'*)  ,  under  a  bound  on  |4'(r) |/E)|»'  . 

But  then,  using  Hampel's  Lemma  5  (see  Hampel,  1974)  we  find  the  solution 
4>(r)  »  t(»g(r)  »  r  If  |r|  <  c 

»  C‘s1gn(r)  otherwise, 
and  this  proves  the  theorem. 

This  class  of  tests  was  proposed  by  Schrader  and  McKean  (1977)  and 
Schrader  and  Hettmansperger  (1980)  and  carries  out  In  a  natural  way  M-estImatlon. 
However,  If  we  look  at  the  Influence  function  of  the  Pj.-test,  we  see  that 
while  the  factor  depending  on  r  Is  bounded,  the  second  factor  depending  on  x 
tends  to  «•  as  |x|  -♦  «*>  .  Therefore,  the  total  Influence  Is  unbounded  and 
this  test  suffers  the  same  problems  as  Huber's  estimator  when  there  are  out¬ 
liers  In  the  factor  space.  This  justifies  the  consideration  of  the  more 
general  class  of  t- tests. 

The  pg-test  can  be  viewed  as  a  likelihood  ratio  test  when  the  error 
distribution  has  a  density  proportional  to  exp(-Pj.(r))  .  This  distribution 
minimizes  the  Fisher-Information  within  the  gross-error  model  ("least 
favorable  distribution").  Note  that  a  test  of  the  same  type  (a  likelihood 
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ratio  test  under  a  least  favorable  distribution)  is  used  by  Carroll  (1980) 
who  proposes  a  robust  method  for  testing  transformations  to  achieve  approximate 
normality. 

Note  that  the  p^-test  is  asymptotically  equivalent  to  a  test  proposed 
by  Bickel  (1976)  who  applies  the  classical  F-test  to  transformed  observations; 
see  Schrader  and  Hettmansperger  (1980)  and  Huber  (1981),  p.  197. 

In  order  to  state  the  general  optimality  result,  we  first  need  the 
following  Lemma. 

Lemma.  Let  c  >  0  .  If  Exx^  is  nonsingular,  then 

(i)  for  sufficiently  large  c  >  0  there  exists  a  synmetric  and 
positive  definite  (pxp)  matrix  M5(c,p-q)  which  satisfies  the 
equation 

E((2$(c-(p-q)V|x^(M’l-(M)*)x|*S)-l)xx^)  =  M  ;  (4.19) 

(ii)  converges  to  Exx^  ,  when  c  -*■  »  ; 

(iii)  Denote  by  the  lower  traingular  matrix  with  positive  diagonal 

elements  such  that  and  define 

ns(x,r) 

with  z  -  Dg^x  . 

Then,  Mg-  ETig(x,r)xx^  . 
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Proof.  Assertions  (1)  and  (11)  can  be  shown  using  the  same  techniques 
as  In  Krasker  (1981,  Proposition  1),  noting  that 

|M||  <|Mi  ,  where  ||M|  :*  * 

(111)  follows  using  the  Choleski  decomposition  of  M  and  (4.19). 

Remark  4.  The  subscript  S  for  Indicates  that  Is  the  matrix 
M  corresponding  to  n5(x,r)  ;  this  function  Is  of  the  Schweppe  form 
(see  section  2). 

Theorem  4.3.  Assume  either  (1)  q=p-l  or  (11)  the  density  of  x  Is 
spherically  symmetric  with  respect  to  x^g)  “  (x^*^^^^  .....  x^^V. 
Then,  the  test  that  solves  the  optimality  problem  for  Infinitesimal 
robustness  within  the  class  of  T-tests,  Is  defined  by  a  function  of 
the  form 


T5(x,r)  =  (||z^2)ll/^P-‘’)’*)’^-  11^(2)* 

'  Pc(x)^^>  • 

where  c(x)  :=  c’(p-q)’*/ Jz^2)ll  *  ^ 

triangular  matrix  (which  exists  because  of  the  Lemma) 

E((24(c*(p-q)V|z^2)n-l)z2^)  =  I  • 


(4.20) 


Proof.  We  show  the  assertion  under  condition  (1).  The  proof  Is  similar 
under  (11).  Put 

M(n)  =  En'(x,r)xx^  ,  Q(n)  =  En^(x,r)xx^ 
and  let  Xp(n)  be  the  positive  eigenvalue  of 
Q{Ti)-(M‘^(n)  -  (M)^(ti))  . 

Moreover,  denote  by  U(ti)  the  Choleski  decomposition  of  M(ti)  =  U(n)‘U^(Ti) 
here  U(ri)  Is  a  lower  traingular  matrix  with  positive  diagonal  elements. 

We  have  to  solve  the  following  problem.  For  a  given  b  >  0  ,  find  a  test 
which  maximizes  i)/^p  ’  side  condition 

^  ^  •  (4.21) 

Since  U^(M'^-(M)‘^)U  =  I-I  we  obtain 

Xp(Ti)  *  (U'^Q  U'''’)pp  =  En^(x,r)|(U"^x)^P^|^  ,  (4.22) 

M(22.i)(n)  «  (Upp(Ti))^  .  (4.23) 

Moreover,  (4.21)  becomes 


(4.24) 
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Choose  c  >  0  such  that  b  =  c/(U5)pp  ,  where  is  defined  by  Lemma  (iii), 
(this  c  exists  because  c^/Uj)pp  >  c^/tr  >  c^/E||x||^  -*■  «  ,  when  c  -*■  “) 
and  assume,  without  loss  of  generality, 

V(n)  -  (Us)pp  .  (4.25) 

(The  multiplication  of  the  test  statistic  by  a  positive  constant  does  not 
change  the  test!)  Combining  (4.22),  (4.23),  (4.24)  and  (4.25),  the 
original  problem  reduces  to  minimize  En^(x,r) • ] (U"^x)^l^^ ,  under  the 
conditions  (4.25)  and 

suPj^^^|n(x,r)|-|(U’^x)^P^|  <c  .  (4.26) 


Now, 


En^(x,r)*|(U'^x)(PV  =  -E  r2-|(Us^x)^PV  +  2 

+  E(n(x,r).(U-^x)(P^-r-(U5^x)^Pb2  , 


E(n(x,r)-r-(U’^x)^P^-(U5^x)^P^) 

=  (U'^-E(n(x,r)-r'xx'^).U~^)pp  , 


since 
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and  integrating  by  parts, 

=  (U'^-E(n'(x.r)xx^)-U‘^)pp  =  (U‘^M  U5^)pp 
-  (u-iu  U^U-'jpp  =  (U^Ui‘)pp  =  “pp/(Us)pp  -  1  . 

where  in  the  last  equalities  we  have  used  (4.25)  and  U^2)  ~  ^  •  Thus, 
minimizing  En^(x,r) • | (U"^x)^P^ ,  under  the  conditions  (4.25)  and  (4.26) 
is  equivalent  to  minimizing 

E{(n(x,r)-(U-lx)(P^  -  r-(Us^x)(Ph^}  . 

subject  to  (4.26).  Clearly,  the  optimal  n*  must  satisfy 

n*(x,r)-((U"^(r,*)x)(Pb  =  (|»j.(r-(U5^x)(PV 


Therefore, 


Tis(x,r)  =  |2(2)r^'’l'c^''l^(2)l^  ’ 

where  z  =  Uj^x  ,  solves  this  extremal  problem.  Any  other  solution 
defines  a  test  which  has  the  same  influence  function  and  the  same  asymptotic 
power  and  in  this  sense  is  equivalent  to  .  This  completes  the  proof. 

Note  that  the  n-functions  defining  optimally  bounded  influence  tests 
and  optimally  bounded  influence  estimators  are  of  the  same  form  (see 


Theorem  4.3  and  (2.6))i  namely  of  Schweppe's  form.  There  is  a  difference 
only  in  the  weights:  the  optimal  weights  for  the  test  take  into  account 
that  (after  standardization)  only  the  last  (p-q)  components  are  of 
interest  for  the  testing  problem. 

From  these  optimally  robust  tests  one  can  derive  robust  confidence 
regions  for  the  parameters  and  a  robust  version  of  stepwise  regression. 
These  procedures  can  be  easily  implemented  in  a  package  for  robust 
regression.  Especially,  it  is  planed  to  integrate  them  in  ROBETH,  a 
package  of  robust  linear  regression  programs  which  have  been  written 
at  the  ETH  Zurich  and  is  still  under  development;  see  Marazzi  (1980). 


5.  TESTS  BASED  ON  QUADRATIC  FORMS  OF  ROBUST  ESTIMATORS 


Let  be  a  general  M-estimator  defined  by  (2.1).  Huber  (1967) 
shows  that  n**(T^-6)  is  asymptotically  normal  with  mean  0  and  asymptotic 
covariance  matrix 

V  =  B"^  A  ,  (5.1) 

where  B  =  -E3f/30  ,  A  *  .  Partition  the  matrix  V  in  qxq  ,  qx(p-q)  * 

(p-q)xq  .  (p-q)x(p-q)  blocks  and  denote  them  by  ,  V^2)  ’  '^(21)  *  ''(22) 

respectively.  Moreover  denote  by  (Tp)^j  and  (Tj^)(2)  vectors  with 
the  first  q  components  and  the  last  (p-q)  components  of  ,  respectively. 

Definition  5.1  Let  be  a  general  M-estImator  defined  by  (2.1).  Then 
the  test  statistic 

defines  a  class  of  tests  for  testing  the  hypothesis  Hq  (see  (4.2)). 

2 

"Large"  values  of  are  significant. 

2 

From  Definition  5.1  we  see  that  Is  a  quadratic  form  of  the 

estimator  (Tp)^2)  respect  to  its  asymptotic  covariance  matrix. 

2 

Therefore,  under  the  conditions  of  Huber  (1967)  and  (4.15),  (p-q)nR^ 
has  asymptotically  a  x  “<l^str1but1on  with  p-q  degrees  of  freedom  and 
noncentrality  parameter 
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Under  the  conditions  of  Theorems  4.2  and  4.3  and  In  view  of  (4.12)  and 

(4.14),  It  follows  easily  that  likelihood  ratio-type  tests  and  tests 
2 

based  on  are  asymptotically  equivalent.  However,  at  least  In  some 
situations,  the  latter  seem  to  have  a  more  liberal  small  sample  behaviour; 
cf.  Schrader  and  Hettmansperger  (1980). 


Remark  1.  A  class  of  tests  based  on  quadratic  forms  of  robust  estimators 

was  proposed  by  Stahel  (1981)  and  Samarov  and  Welsch  (1982)  In  the  case 

q=p-l  and  for  M-estImators  of  the  form  (2.2).  Schrader  and  Hettmusperger 

(1980)  consider  the  same  class  of  tests  In  the  special  case  Ti(x,r)  =  \|;(r)  . 

Let  us  now  compute  the  Influence  function  of  this  new  class  of  tests. 

Since  the  computations  are  similar  to  those  In  Theorem  4.1,  we  drop  them. 

Note  only  that  we  compute  the  Influence  function  of  R  and  not  of  R  , 

the  latter  being  Identical  to  zero.  This  does  not  affect  the  test  since 
2 

R  and  R  define  the  same  test.  The  Influence  function  of  R  at  the  model 
F~  Is  given  by 


IF(x,y;R,Fg)  =  (z'^(V^22)^’^^/^P’‘’^^^  ’ 


(5.3) 


where  z  »  IF(x,y;T^2)*^e^ 
T^2)  model  Fg  . 

Thus  we  have 


Is  the  Influence  function  of  the  estimator 


suPj^,y|IF(x,y;R,Fg)|  =  (P-q)‘Sf|(T(2) 


(5.4) 
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where  y*  is  the  self-standardized  sensitivity  of  the  estimator  » 

c.f.  Krasker  and  Welsch  (1982). 

Note  again  that  the  Wald  test,  which  is  defined  through  the  test 
2 

statistic  when  T  is  the  LS  estimator,  has  an  unbounded  influence 

p 

function.  Therefore  our  goal  is  to  find  a  test  based  on  that 

maximizes  the  asymptotic  power  under  a  bound  on  the  influence  function 
(5.3).  In  view  of  (5.2)  and  (5.4),  this  problem  is  equivalent  to  the 
following  estimation  problem 

Find  a  function  '¥  defining  an  M-estimator  T  that  minimizes 
the  asymptotic  covariance  matrix  ^^22)  ^(2)  ’  ® 

bound  on  the  self-standardized  sensitivity  of  1^2)  • 

It  turns  out  that  it  is  impossible  to  find  an  M-estimator  that  minimizes 
in  the  strong  sense  the  asymptotic  covariance  matrix,  subject  to  a  bound 
on  the  self-standardized  sensitivity.  A  counterexample  can  be  found  in 
Stahel  (1981).  However,  the  following  admissibility  result  can  be  shown. 

Definition  5.2  A  test  defined  by  m*  dominates  a  test  defined  by  ^ 

if  as. power  .  as. power  .  H'  ,  and  there  is  a  such  that 

i(2)  A, 3)  (2) 

as.power^^  V"  >  as.power2^2)f  • 

Definition  5.3  A  test  defined  by  f  is  called  admissible  if  there  is 
no  test  that  dominates  it. 

Theorem  5.1  For  a  given  c  >  1  ,  let  C_(y)  be  the  class  of  tests 
given  in  Definition  5.1  and  such  that  sup„  „IIF(x,y;R,F2)|  <  c  .  Then, 

A  *jr  V 

the  test  defined  by  the  following  function  is  admissible  within  Cj.(f) 
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'«^l)(x,y;e)  «  r  -  . 

where  and  denote  the  first  q  and  last  (p-q)  components 

of  't*  respectively,  r  =  y-x^e  ,  z  =  Dx  ,  and  D  is  defined  implicitly 
by  the  matrix  equation 

E'P(x,y;e)rx^  =  I  . 

Proof.  In  veiw  of  (5.2)  and  (5.4)  the  testing  problem  is  equivalent  to 
that  of  finding  an  M-estimator  T  which  is  admissible  with  respect  to 
the  asymptotic  covariance  matrix  ^^22)  *  condition 

(P-<1)”*V|(T^2)**^0^  ^  ^  estimation  problem  is  a  special  case  of 

the  problem  solved  in  Stahel  (1981,  p.  40)  (cf.  also  Hampel,  Ronchetti, 
Rousseeuw,  Stahel  (1984),  Theorem  1,  section  4.4)  and  the  result  follows. 

If  we  restrict  ourselves  to  the  class  (2.2),  either  under  the 
conditions  of  Theorem  4.2  or  Theorem  4.3,  it  is  easy  to  see  that  the 
test  defined  by  f*  is  asymptotically  equivalent  (that  is,  it  has  the 
same  asymptotic  power  and  the  same  influence  function)  to  the  corresponding 
optimally  robust  tests  given  in  those  theorems.  Therefore,  under  those 
conditions,  is  in  fact  storng  optimal. 
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6.  EXAMPLE 

The  following  example  based  on  a  real  data  set  should  give  an  Idea 
of  the  small  sample  behaviour  of  our  optimally  robust  tests.  The  data  are 
taken  from  Draper  and  Smith  (1966,  p.  104  ff.).  We  have  the  following 
variables: 

Y  s  response  or  number  of  pounds  of  steam  used  per  month, 

Xg  =  average  atmospheric  temperature  in  the  month  (In  ®F) 

Xg  =  number  of  operating  days  In  the  month. 

Table  1  shows  the  data. 

We  consider  the  linear  model 


V  .  a  *  BjXa  *  hh  *  '  • 


and  we  want  to  test  the  hypothesis 


The  factor  space  Is  given  by  Figure  1  and  the  observations  are  plotted 
in  Figure  2a. 

From  Figure  1  we  see  that  there  exist  two  outliers  In  the 
factor  space.  We  want  to  study  the  behaviour  of  the 
log^gP-values  of  the  F-  ,  p-  and  optimal  x-test  when  the 
observation  (y^)  corresponding  to  the  point  (Xg  «  74.4  , 

Xg  *  11)  varies  between  0  and  20.  (Its  actual  value  1s  6.36) 
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The  tests  under  study  are  defined  by  the  following  functions 


test 

x(x,r) 

F 

r^/2 

Pc 

^1 

P.  (r) 

^1 

(see  Theorem  4.2) 

optimal  1 

r  P  /o\  (r) 

(z^^^  =  (Uj^x)^^^)  (see  Theorem  3) 

The  scale  parameter  a  was  estimated  in  the  full  model  using  Huber's 
Proposal  2.  Note  that  in  this  case  the  optimal  x-test  is  equivalent  to 
the  corresponding  test  based  on  quadratic  forms  given  in  section  5.  The 
constants  c^  and  C2  were  chosen  such  that  the  corresponding  tests 
have  a  given  efficiency,  say  .95,  at  the  normal  model  (that  is,  when 

and  e^  are  normally  distributed).  We  obtained  the  following  values: 

Cj  =  1.345  C2  =  2.67  . 

Figure  2b  shows  the  overall  excellent  behaviour  of  the  x-test 
(strongly  significant  for  all  yl),  the  good  behaviour  of  the 
Pg-test  (at  least  for  y  >  8)  which  is  still  significant  (at 
the  5X  level)  in  the  region  y  >  8  and  the  bad  behaviour  of 
the  F-test  which  becomes  even  not  significant  for  8.7  <  y  <  18.7  I 
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6.  FURTHER  RESEARCH  DIRECTIONS 


Approximate  critical  regions  for  optimally  robust  tests  derived 
in  this  paper  can  be  obtained  using  the  asymptotic  distribution  of  the 
test  statistic,  see  (4.16)  and  (5.2).  This  approximation  can  be 
improved  in  two  ways. 

2  2 

The  first  possibility  is  to  use  or  R  defined  in  section  4 

n  n 

and  5  respectively,  as  test  statistic  for  a  permutation  test.  This  would 
guarantee,  on  one  side  an  exact  level  a-test  (a  property  of  permutation, 
tests)  and  on  the  other  side  a  high  robustness  of  efficiency  (a  property 
of  and  R‘)  .  This  idea  has  been  applied  for  constructing  a  con¬ 
firmatory  test  in  connection  with  a  Swiss  hail  experiment;  see  Hampel, 
Schweingruber,  Stahel  (1982).  Some  work  is  needed  to  justify  this 
combined  procedure  from  a  theoretical  point  of  view;  cf.  Lambert  (1982). 

The  second  way  is  to  find  better  approximations  to  the  distribution 
of  S‘  and  R^  .  A  promising  approach  is  small  sample  asymptotics  that 
has  been  used  successfully  in  the  location  case;  see  Hampel  (1973b), 

Field  and  Hampel ,(1982).  This  is  subject  of  ongoing  research. 
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Figure  1:  The  variables  Xg  , 
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